VPU Enabling and Usage
Problems running the pipelines shown on this page? Please see our
GStreamer Debugging guide for help
.
Introduction
The VPU, or Video Processing Unit, is the hardware block responsible for accelerating video codec operations. Instead of encoding or decoding video with the CPU, the VPU performs these operations in dedicated hardware, and reduces CPU load.
Encoder controls
The VPU encoder exposes additional V4L2 controls that can be used to modify the behavior of the hardware encoder. These controls are useful for configuring bitrate, GOP structure, latency, slice settings, and stream formatting.
To see the available encoder controls, run:
v4l2-ctl -d /dev/video33 --list-ctrls
| Control | Description |
|---|---|
video_bitrate
|
Sets the target encoder bitrate. |
video_peak_bitrate
|
Sets the maximum peak bitrate. |
video_bitrate_mode
|
Selects the bitrate mode, such as variable bitrate. |
video_gop_size
|
Sets the GOP size, or distance between keyframes. |
video_gop_closure
|
Enables closed GOP behavior. |
video_b_frames
|
Enables or disables B-frames. |
force_key_frame
|
Forces the encoder to generate a keyframe. |
lowlatency_mode
|
Enables low-latency encoding mode. |
frame_level_rate_control_enable
|
Enables frame-level rate control. |
slice_partitioning_method
|
Selects the slice partitioning method. |
maximum_bytes_in_a_slice
|
Sets the maximum number of bytes per slice. |
number_of_mbs_in_a_slice
|
Sets the number of macroblocks per slice. |
intra_refresh_period
|
Sets the intra-refresh period. |
prepend_sps_and_pps_to_idr
|
Prepends SPS/PPS headers to IDR frames. |
generate_access_unit_delimiters
|
Adds access unit delimiters to the encoded stream. |
complexity
|
Sets the encoder complexity level. |
hevc_enc_without_startcode
|
Controls whether HEVC output is generated without start codes. |
hevc_size_of_length_field
|
Sets the HEVC length field size. |
These controls can be passed to GStreamer through the extra-controls property when using the V4L2 encoder element.
Decoder controls
The VPU decoder also exposes V4L2 controls. These controls are mainly used to configure decode behavior, display delay, low-latency operation, timestamp handling, metadata output, and buffer management.
To see the available decoder controls, run:
v4l2-ctl -d /dev/video32 --list-ctrls
| Control | Description |
|---|---|
decoder_slice_interface
|
Enables or disables decoder slice interface mode. |
display_delay
|
Sets the display delay value. |
display_delay_enable
|
Enables display delay control. |
lowlatency_mode
|
Enables low-latency decode mode. |
frame_rate
|
Sets the decoder frame rate value. |
operating_rate
|
Sets the decoder operating rate. |
ts_reorder
|
Enables or disables timestamp reordering. |
max_num_reorder_frames
|
Reports or controls the maximum number of reordered frames. |
coded_frames
|
Reports coded-frame information. |
thumbnail_mode
|
Enables thumbnail decode mode. |
priority
|
Sets the decoder priority. |
secure_mode
|
Enables secure decode mode. |
codec_config
|
Controls codec configuration handling. |
bitstream_size_overwrite
|
Overrides the bitstream size. |
meta_timestamp
|
Enables timestamp metadata. |
meta_picture_type
|
Enables picture-type metadata. |
meta_dec_qp_metadata
|
Enables decoder QP metadata. |
meta_concealed_mb_cnt
|
Enables concealed macroblock count metadata. |
meta_interlace
|
Enables interlace metadata. |
last_flag_event_enable
|
Enables last-flag event signaling. |
VPU use verification
These elements were tested both in the Ubuntu downloaded sources (Ubuntu 24.04.4) and the Yocto built image. use was verified within the debugging logs generated from GStreamer for multiple pipelines, for example:
GST_DEBUG="v4l2*:4,*h264*:4,*codec*:4" GST_DEBUG_FILE=/tmp/gst-vpu.log gst-launch-1.0 filesrc location=camera-720p30-h265.mp4 ! qtdemux ! h265parse ! v4l2h265dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v | grep -Ei "v4l2h264dec|v4l2.*decoder|/dev/video3[2-3]|VIDIOC_STREAMON" /tmp/gst-vpu.log
And the expected output look like this:
0:00:00.052985053 4026 0xaaaaead39a50 INFO v4l2 v4l2_calls.c:592:gst_v4l2_open:<v4l2h265dec0:sink> Opened device 'msm_vidc_decoder' (/dev/video32) successfully 0:00:00.053088127 4026 0xaaaaead39a50 INFO v4l2 v4l2_calls.c:688:gst_v4l2_dup:<v4l2h265dec0:src> Cloned device 'msm_vidc_decoder' (/dev/video32) successfully
All of these encoders and decoders expose the V4L2 I/O tuning properties capture-io-mode and output-io-mode. For the Dragonwing 9075 EVK, these properties can be configured as dmabuf to take advantage of hardware buffer sharing and reduce unnecessary memory copies between hardware-accelerated pipeline elements.
capture-io-mode: Configures the I/O mode used by the decoder capture queue. This corresponds to the decoder output side, or the buffers produced by the decoder on itssrcpad. It is configured asautoby default. It can be changed to:
- auto: The default option when validating a pipeline or debugging negotiation issues.
- rw: Uses standard read/write system calls. Use it only for basic debugging or compatibility testing.
- mmap: Uses memory-mapped buffers allocated by the V4L2 driver. Useful when DMABUF negotiation is not available.
- userptr: Uses user-allocated buffers passed to the V4L2 driver.
- dmabuf: Uses DMA buffer file descriptors for buffer sharing. This mode is recommended for Dragonwing 9075 EVK hardware-accelerated pipelines when downstream elements support DMABUF.
- dmabuf-import: Imports externally allocated DMA buffers into the V4L2 element. Use this mode when another element or subsystem owns the buffers and the decoder should import them.
output-io-mode: Configures the I/O mode used by the decoder output queue. This corresponds to the decoder input side, or the compressed buffers consumed by the decoder on itssinkpad. It is configured asautoby default. It can be changed to:
- auto: Lets GStreamer and the V4L2 driver choose the most appropriate I/O mode.
- rw: Uses standard read/write access.
- mmap: Uses memory-mapped buffers.
- userptr: Uses application-provided memory.
- dmabuf: Uses DMA buffer file descriptors.
- dmabuf-import: Imports DMA buffers provided by an upstream element.
Testing VPU Decoders
The hardware decoders tested in this section include: v4l2h264dec, v4l2h265dec, v4l2vp9dec, and v4l2av1dec.
H.264 MP4 decode
gst-launch-1.0 filesrc location=input_h264.mp4 ! qtdemux ! h264parse ! v4l2h264dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v
- Expected result
plays H.264 video
H.264 MKV decode
gst-launch-1.0 filesrc location=input_h264.mkv ! matroskademux ! h264parse ! v4l2h264dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v
- Expected result
plays H.264 MKV video
H.265 MP4 decode
gst-launch-1.0 filesrc location=input_h265.mp4 ! qtdemux ! h265parse ! v4l2h265dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v
- Expected result
plays H.265 video
H.265 MKV decode
gst-launch-1.0 filesrc location=input_h265.mkv ! matroskademux ! h265parse ! v4l2h265dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v
- Expected result
plays H.265 MKV video
VP9 decode
gst-launch-1.0 filesrc location=input_vp9.webm ! matroskademux ! vp9parse ! v4l2vp9dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v
- Expected result
plays VP9 video
AV1 decoder
gst-launch-1.0 filesrc location=input_av1.mp4 ! qtdemux ! av1parse ! v4l2av1dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v
- Expected result
plays AV1 video
Testing VPU Encoders
The hardware decoders tested in this section include: v4l2h264enc, v4l2h265enc
Create H.264 MP4
gst-launch-1.0 videotestsrc num-buffers=300 pattern=smpte ! video/x-raw,width=1920,height=1080,framerate=30/1 ! v4l2h264enc output-io-mode=dmabuf ! h264parse ! mp4mux ! filesink location=input_h264.mp4 -v
- Expected result
creates H.264 MP4 video
Create H.264 MKV
gst-launch-1.0 videotestsrc num-buffers=300 pattern=snow ! video/x-raw,width=1280,height=720,framerate=30/1 ! v4l2h264enc output-io-mode=dmabuf ! h264parse ! matroskamux ! filesink location=input_h264.mkv -v
- Expected result
creates H.264 MKV video
Create H.265 MP4
gst-launch-1.0 videotestsrc num-buffers=300 pattern=smpte ! video/x-raw,width=1920,height=1080,framerate=30/1 ! v4l2h265enc output-io-mode=dmabuf ! h265parse ! mp4mux ! filesink location=input_h265.mp4 -v
- Expected result
creates H.265 MP4 video
Create H.265 MKV
gst-launch-1.0 videotestsrc num-buffers=300 pattern=snow ! video/x-raw,width=1280,height=720,framerate=30/1 ! v4l2h265enc output-io-mode=dmabuf ! h265parse ! matroskamux ! filesink location=input_h265.mkv -v
- Expected result
creates H.265 MKV video
Performance
- For performance on encoder go to Video Encoding Page
- For performance on decoder go to Video Decoding Page
Glass-to-Glass Measurements
This section documents glass-to-glass latency measurements using different display sinks. In the case of kmssink, the display manager was stopped to fully own the display.
The camera used is a MIPI IMX577
Pipeline using waylandsink:
gst-launch-1.0 -e qtiqmmfsrc name=camsrc camera=0 ! 'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1,interlace-mode=progressive,colorimetry=bt601' ! waylandsink
Pipeline using kmssink
Before running the tests with kmssink, stop the graphical interface and switch to multi-user mode:
sudo systemctl isolate multi-user.target
Then run:
gst-launch-1.0 -e qtiqmmfsrc name=camsrc camera=0 ! 'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1,interlace-mode=progressive,colorimetry=bt601' ! kmssink
Results
| Sink | 4K latency | 720p latency |
|---|---|---|
kmssink
|
0.147339038 s | 0.108750446 s |
waylandsink
|
0.15126692 s | 0.12861633 s |